Prosper Loans Exploration by Gabor Sar

Analysis

Data Overview

## [1] 113937     81
##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"

There are 113937 loans in the dataset with 81 features.

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

ListingCategory..numeric. contains integer numbers. Converting it to a factor could be very usefull.

I am going to map the following numbers to the following levels:

Number Level
0 Not Available
1 Debt Consolidation
2 Home Improvement
3 Business
4 Personal Loan
5 Student Use
6 Auto
7 Other
8 Baby&Adoption
9 Boat
10 Cosmetic Procedure
11 Engagement Ring
12 Green Loans
13 Household Expenses
14 Large Purchases
15 Medical/Dental
16 Motorcycle
17 RV
18 Taxes
19 Vacation
20 Wedding Loans

ListingCreationDate, ClosedDate and LoanOriginationDate are factors. Converting them to dates could make plotting by date easier.

Univariate Analysis

##                    ListingKey     ListingNumber     ListingCreationDate 
##  17A93590655669644DB4C06:     6   Min.   :      4   Min.   :2005-11-09  
##  349D3587495831350F0F648:     4   1st Qu.: 400919   1st Qu.:2008-09-19  
##  47C1359638497431975670B:     4   Median : 600554   Median :2012-06-16  
##  8474358854651984137201C:     4   Mean   : 627886   Mean   :2011-07-08  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634   3rd Qu.:2013-09-09  
##  04C13599434217079754AEE:     3   Max.   :1255725   Max.   :2014-03-10  
##  (Other)                :113912                                         
##   CreditGrade         Term                       LoanStatus   
##         :84984   Min.   :12.00   Current              :56576  
##  C      : 5649   1st Qu.:36.00   Completed            :38074  
##  D      : 5153   Median :36.00   Chargedoff           :11992  
##  B      : 4389   Mean   :40.83   Defaulted            : 5018  
##  AA     : 3509   3rd Qu.:36.00   Past Due (1-15 days) :  806  
##  HR     : 3508   Max.   :60.00   Past Due (31-60 days):  363  
##  (Other): 6745                   (Other)              : 1108  
##    ClosedDate          BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :2005-11-25   Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:2009-07-14   1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :2011-04-05   Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :2011-03-07   Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:2013-01-30   3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :2014-03-10   Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :58848        NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount LoanOriginationDate  LoanOriginationQuarter
##  Min.   : 1000      Min.   :2005-11-15   Q4 2013:14450         
##  1st Qu.: 4000      1st Qu.:2008-10-02   Q1 2014:12172         
##  Median : 6500      Median :2012-06-26   Q3 2013: 9180         
##  Mean   : 8337      Mean   :2011-07-21   Q2 2013: 7099         
##  3rd Qu.:12000      3rd Qu.:2013-09-18   Q3 2012: 5632         
##  Max.   :35000      Max.   :2014-03-12   Q2 2012: 5061         
##                                          (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
##                                                                          
##            ListingCategory 
##  Debt Consolidation:58308  
##  Not Available     :16965  
##  Other             :10494  
##  Home Improvement  : 7433  
##  Business          : 7189  
##  Auto              : 2572  
##  (Other)           :10976

Most of the loans have current or completed status.

The minimum estimated return is -0.183. The median estimated return is 0.092, and the maximum is 0.284. This seems pretty balanced.

Most of the loans (51%) for debt consolidation.

At least 88% percent of the borrowers employed, and the median of their employment duration is 67 months.

The median upper credit score range is 699, and the maximum is 899. The lower and upper credit score range values are very close to each other.

The median stated monthly income is $4,667, and the maximum is $1,750,003. This is a big difference. 92% of the incomes are verifiable.

The amount of loans are between $1,000 and $35,000, and the median is $6,500.

The median monthly loan payment is $217.7, and the maximum is $2,251.5. This is also a big difference.

Loan Original Amount

##    4000   15000   10000    5000    2000    3000   25000   20000    1000 
##   14333   12407   11106    6990    6067    5749    3630    3291    3206 
##    2500 (Other) 
##    2992   44166

Most of the loan amount values are between $1,000 and $10,000. There are some values that are more frequent than the others. Based on the list of the ten most frequent values, it seems like approximately every five thousandth. Setting the binwidth to 5000 shows a decreasing trend.

Loan Origination Date

There is an increasing trend in the number of loans from 2006 to late 2008 and from late 2009 to 2014, and there is a gap between late 2008 and late 2009. Enlarging that timeframe shows that there were almost 0 loans registered in an approximately 10-months period. The most likely reason for this anomaly is the subprime mortgage crisis.

Term

## 
##    12    36    60 
##  1614 87778 24545

77% of the loans last 36 months (3 years), 21% last 60 months (5 years) and 1% last 12 months (1 year).

Monthly Loan Payment

## 
##  FALSE   TRUE 
## 113002    935
## 
##              Cancelled             Chargedoff              Completed 
##                      0                      0                    800 
##                Current              Defaulted FinalPaymentInProgress 
##                      0                    131                      4 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                      0                      0                      0 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                      0                      0                      0

The majority of the monthly payments are between $0 and $500. There are some outliers. 935 loans have a monthly payment of $0. All of those loans are completed, defaulted or the final payment is in progress.

Stated Monthly Income

The histogram of monthly incomes shows a serious outlier issue. This supports my feelings about the big difference between the median ($4,667), and the maximum ($1,750,003).

## 
##  FALSE   TRUE 
## 112543   1394

Limiting the values to the .99 quantile and setting the binwidth to 150 shows a much better, positively skewed normal distribution. The only thing that does not seem obvious is the high frequency (1394) of the 0 values.

## 
##      Not Available Debt Consolidation   Home Improvement 
##                238                358                 47 
##           Business      Personal Loan        Student Use 
##                249                121                 67 
##               Auto              Other      Baby&Adoption 
##                 28                183                  1 
##               Boat Cosmetic Procedure    Engagement Ring 
##                  1                  1                  4 
##        Green Loans Household Expenses    Large Purchases 
##                  2                 56                  6 
##     Medical/Dental         Motorcycle                 RV 
##                 17                  0                  0 
##              Taxes           Vacation      Wedding Loans 
##                  4                  8                  3

Listing category does not explain the zero values.

## 
##              Cancelled             Chargedoff              Completed 
##                      1                    347                    677 
##                Current              Defaulted FinalPaymentInProgress 
##                    259                     80                      0 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                      1                     11                      5 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                      3                      7                      3

Most of these loans are completed, charged off or defaulted, but still 259 of them are current.

## 
## False  True 
##  1330    64

Only 64 of these loans contains a verifiable monthly income.

Debt to Income Ratio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554
## 
##  FALSE   TRUE 
## 104948    435

Most of the debt income ratio values are between 0.14 and 0.32. There are some outliers. The maximum is 10.01, and there are 435 values greater than 2.5.

Credit Score Range Lower

## 
##     0   360   420   440   460   480   500   520   540   560   580   600 
##   133     1     5    36   141   346   554  1593  1474  1357  1125  3602 
##   620   640   660   680   700   720   740   760   780   800   820   840 
##  4172 12199 16366 16492 15471 12923  9267  6606  4624  2644  1409   567 
##   860   880 
##   212    27

Most of the lower credit score range values are between 600 and 800. There are some outliers.

Credit Score Range Upper

## 
##    19   379   439   459   479   499   519   539   559   579   599   619 
##   133     1     5    36   141   346   554  1593  1474  1357  1125  3602 
##   639   659   679   699   719   739   759   779   799   819   839   859 
##  4172 12199 16366 16492 15471 12923  9267  6606  4624  2644  1409   567 
##   879   899 
##   212    27

Most of the upper credit score range values are between 600 and 800. There are some outliers.

This data seems almost the same as the lower credit score range values.

Borrower Rate

## Source: local data frame [2,294 x 2]
## 
##    BorrowerRate count
## 1        0.3177  3672
## 2        0.3500  1905
## 3        0.3199  1651
## 4        0.2900  1508
## 5        0.2699  1319
## 6        0.1500  1182
## 7        0.1400  1035
## 8        0.1099   949
## 9        0.2000   907
## 10       0.1585   806
## ..          ...   ...

Most borrower rates are between 0.1 and 0.3. The values at 0.32 and 0.35 are unexpectedly frequent.

Borrower APR

## Source: local data frame [6,678 x 2]
## 
##    BorrowerAPR count
## 1      0.35797  3672
## 2      0.35643  1644
## 3      0.37453  1260
## 4      0.30532   902
## 5      0.29510   747
## 6      0.35356   721
## 7      0.29776   707
## 8      0.15833   652
## 9      0.24246   605
## 10     0.24758   601
## ..         ...   ...

Most borrower APR values are between 0.1 and 0.3. The values at 0.32 and 0.35 are unexpectedly frequent.

It seems like there is a strong relationship between borrower rate and borrower APR.

I would like to know if there is any explanation of those unexpectedly frequent values.

Employment Status

## 
##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

89% of the borrowers employed or retired (Employed, Full-time, Part-time, Retired, Self-employed), <1% Not employed, and there is no useful employment information about 10% of them (NA, Not available, Other).

Employment Status Duration

Most of the employment status durations are between 0 and 200 months, and the distribution shows a decreasing trend.

Estimated Return

## 
## FALSE  TRUE 
## 84658   195

Most of the estimated returns are between 0.05 and 0.15. There are some outliers, and 195 values are negative.

Listing Category

## 
##      Not Available Debt Consolidation   Home Improvement 
##              16965              58308               7433 
##           Business      Personal Loan        Student Use 
##               7189               2395                756 
##               Auto              Other      Baby&Adoption 
##               2572              10494                199 
##               Boat Cosmetic Procedure    Engagement Ring 
##                 85                 91                217 
##        Green Loans Household Expenses    Large Purchases 
##                 59               1996                876 
##     Medical/Dental         Motorcycle                 RV 
##               1522                304                 52 
##              Taxes           Vacation      Wedding Loans 
##                885                768                771

Most loans have a listing category of debt consolidation. A lot of loans does not have a useful listing category (Not Available, Other).

Loan Status

## 
##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

Most of the loans have a loan status of current, completed, charged off or defaulted.

Bivariate and Multivariate Analysis

##                          LoanOriginalAmount        Term MonthlyLoanPayment
## LoanOriginalAmount               1.00000000  0.33892746         0.93198368
## Term                             0.33892746  1.00000000         0.09102578
## MonthlyLoanPayment               0.93198368  0.09102578         1.00000000
## StatedMonthlyIncome              0.20125947  0.02847925         0.19683026
## DebtToIncomeRatio                0.01011222 -0.01467005         0.02759840
## CreditScoreRangeLower            0.34087445  0.12626345         0.29253205
## CreditScoreRangeUpper            0.34087445  0.12626345         0.29253205
## EmploymentStatusDuration         0.09814935  0.08247591         0.08116016
## EstimatedReturn                 -0.28611751  0.15250541        -0.25273127
## BorrowerRate                    -0.32895995  0.02008537        -0.24474235
## BorrowerAPR                     -0.32288669 -0.01118347        -0.22665287
##                          StatedMonthlyIncome DebtToIncomeRatio
## LoanOriginalAmount                0.20125947        0.01011222
## Term                              0.02847925       -0.01467005
## MonthlyLoanPayment                0.19683026        0.02759840
## StatedMonthlyIncome               1.00000000       -0.12265939
## DebtToIncomeRatio                -0.12265939        1.00000000
## CreditScoreRangeLower             0.10790082       -0.01316852
## CreditScoreRangeUpper             0.10790082       -0.01316852
## EmploymentStatusDuration          0.06983037       -0.01160926
## EstimatedReturn                  -0.07501281        0.08723617
## BorrowerRate                     -0.08898180        0.06291678
## BorrowerAPR                      -0.08233849        0.05632742
##                          CreditScoreRangeLower CreditScoreRangeUpper
## LoanOriginalAmount                  0.34087445            0.34087445
## Term                                0.12626345            0.12626345
## MonthlyLoanPayment                  0.29253205            0.29253205
## StatedMonthlyIncome                 0.10790082            0.10790082
## DebtToIncomeRatio                  -0.01316852           -0.01316852
## CreditScoreRangeLower               1.00000000            1.00000000
## CreditScoreRangeUpper               1.00000000            1.00000000
## EmploymentStatusDuration            0.08113411            0.08113411
## EstimatedReturn                    -0.34623273           -0.34623273
## BorrowerRate                       -0.46156668           -0.46156668
## BorrowerAPR                        -0.42970732           -0.42970732
##                          EmploymentStatusDuration EstimatedReturn
## LoanOriginalAmount                    0.098149347     -0.28611751
## Term                                  0.082475906      0.15250541
## MonthlyLoanPayment                    0.081160161     -0.25273127
## StatedMonthlyIncome                   0.069830374     -0.07501281
## DebtToIncomeRatio                    -0.011609265      0.08723617
## CreditScoreRangeLower                 0.081134109     -0.34623273
## CreditScoreRangeUpper                 0.081134109     -0.34623273
## EmploymentStatusDuration              1.000000000     -0.03648651
## EstimatedReturn                      -0.036486506      1.00000000
## BorrowerRate                         -0.019907440      0.81766987
## BorrowerAPR                          -0.008588601      0.79427520
##                          BorrowerRate  BorrowerAPR
## LoanOriginalAmount        -0.32895995 -0.322886690
## Term                       0.02008537 -0.011183469
## MonthlyLoanPayment        -0.24474235 -0.226652867
## StatedMonthlyIncome       -0.08898180 -0.082338491
## DebtToIncomeRatio          0.06291678  0.056327417
## CreditScoreRangeLower     -0.46156668 -0.429707322
## CreditScoreRangeUpper     -0.46156668 -0.429707322
## EmploymentStatusDuration  -0.01990744 -0.008588601
## EstimatedReturn            0.81766987  0.794275198
## BorrowerRate               1.00000000  0.989823970
## BorrowerAPR                0.98982397  1.000000000

Original amount correlates with monthly payment, term, credit score range lower and credit score range upper.

Credit score range lower and credit score range upper are almost identical. Borrower rate and borrower APR also have a very strong realtionship. This means I can omit credit score range upper and borrower APR. I am going to use credit score range lower and borrower rate only.

I would like to see how original amount and monthly payment depends on other variables, and I am also interested in the changes of different variables over the time.

Relationships of Loan Original Amount and Monthly Loan Payment

There are three strong linear relationships between original amount and monthly payment.

This means that original amount cannot describe the variation of monthly payment alone, there must be something else participating in it. Based on the strong correlation between term and monthly payment, and knowing that there are three possible term values, term seems the best next variable to investigate.

## 
##  Pearson's product-moment correlation
## 
## data:  MonthlyLoanPayment and LoanOriginalAmount/Term
## t = 1386.25, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9712849 0.9719350
## sample estimates:
##       cor 
## 0.9716118

Monthly loan payment almost equals to the original amount divided by the term of the loan. Therefore monthly loan payment can be described by original amount and term together.

Let’s see how credit score and borrower rate values affect original amount and monthly payment.

Regardless the relatively strong correlations I cannot find any visible relationship between credit score range lower / borrower rate and original amount / monthly payment.

The last plot seems overplotted, it may worth to revisit it with a lover alpha level.

With the decreased alpha level, I can see some underlying trend, but it is still not enough clean.

Adding term to the plot may help again.

Faceting by term shows a some difference between the previously seen trends.

As term and original amount has a relationship to monthly loan payment adding original amount to the previous plot may make the picture clearer.

Faceting by term and colouring by original amount shows a complex relationship, were both borrower rate, orginal amount and term participates in monthly payment.

I am going to check if stated monthly income can add anything to the picture.

It is clearly visible on this plot (as well), that there are some original amount values that more frequent than the others (approximately every five thousandth), but other than that, this plot cannot help too much.

There is a relationship between monthly loan payment and listing category. Loans for Debt Consolidation have the highest monthly payment, followed by Not Available, Home Improvement, Business and Other.

Changes in Variables Over Time

All the loans before the previously mentioned gap (between late 2008 and late 2009) have a 36-months term. 12-months loans were only occurring for a 2-years period between 2011 and 2013 and were not popular. 60-months loans started to occur at the same time as 12-months loans (in 2011), and there is an increasing trend in the number of those since then.

There is no useful employment status information before late 2006. The Employed category almost replaced the Full Time category in late 2010, this is probably due to a change in the way the data was recorded.

The employment status durations have an increasing trend, but it does not have a strong relation with the changes int the employment status.

Let’s see if there is any relationship between employment status and employment status duration.

There is a relationship between the employment status and the employment status duration. Employed, Full-time, Retired and Self-employed statuses tend to have a higher duration as Not employed or Part-time.

There is no useful listing category information before 2008. New listing categories were introduced in 2008 and 2012 as well. This indicates a change in the way the data was recorded, similarly to listing category.

There is no visible pattern within listing categories.

Final Plots

First Plot

There is a very strong relationship between the term, the original amount and the monthly payment of a loan. The latter can be calculated from the formers. The monthly payment of a loan increases as the term decreases - short term loans have higher monthly payment than long term loans - and increases as the original amount increases.

Second Plot

Monthly loan payment is affected by the borrower rate, higher borrower rate leads to higher monthly payemnt. This relationship is not affected by either the term or the original amount of a loan.

Third Plot

Monthly loan payment vary differently across listing categories. Some categories has more wide monthly payment range (e.g., Debt Consolidation), others has less wide payment range (e.g., Motorcycle).

Reflection

The prosper loan dataset contains 113937 observations of 83 variables from 2006 to 2014. At the beginning of my investigation I was trying to understand the meaning of different variables, find issues with how they stored and how can I make the best use of them. As the dataset contains a lot of variables, it was challenging to decide where to start. Finally, I have chosen to start investigation the variables that best representing a loan: original amount, term and monthly payment, and everything else that can have a relationship with them. First I started to plot each variable one by one to have an initial view about what the dataset contains, and how good the quality of it. I was very surprised when I saw that there were loans with 0 monthly payment, but later I realised that those loans are not current anymore. It was also unexpected that there were no loans in the dataset between late 2008 and late 2009 but after some research I understood that the reason for the gap is the subprime mortgage crisis. Later, during the analysis of the relationships between the different variables I realized that the monthly payment is the most important property of a loan, instead of the original amount as I thought before. I found that the term and the original amount of a loan the most relevant parameters to calculate the monthly payment. Borrower rate has an important role as well, however it was quite hard to visualize that relationship. I also found that there is a difference in the monthly payment, and the variation of the monthly payment across listing categories. Analyzing the changes of the variables over the time shown me that there were a few changes in the way how was the data collected. That made it hard to use some categorical variables, like employment status. The standardization of the mentioned categorical variables would lead to an investigation that could make a better use of them. Investigating the dataset without the loans that are for debt consolidation could give a good picture of the choices of the borrowers. Investigating the loans that are paid by debt consolidation would be useful as well.